{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. A Simple Neural Network\n",
    "\n",
    "## (a)\n",
    "\n",
    "Denote the weighted input to the hidden layer resp. the output layer by $z^{[1]}$ resp. $z^{[2]}$. Also denote the activation of the hidden layer by $a^{[1]}$. This means that for a sample $x= (x_1,x_2)$ we have\n",
    "\n",
    "$$\\begin{align*}\n",
    "z^{[1]}_j &= w^{[1]}_{0,j} + \\sum_{i=1}^2 w^{[1]}_{i,j}x_i &\\text{for } j\\in \\{1,2,3\\}\\\\\n",
    "a^{[1]}_j &= g(z^{[1]}_j)&\\text{for } j\\in \\{1,2,3\\}\\\\\n",
    "z^{[2]} &= w^{[2]}_{0} + \\sum_{j=1}^2 w^{[2]}_{j}a^{[1]}_j\\\\\n",
    "o &= g(z^{[2]})\n",
    "\\end{align*}$$\n",
    "where $g(u)=1/(1+\\exp(-u))$ is the sigmoid function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The loss of a single example $x$ with label $y$ is $(o-y)^2$. \n",
    "\n",
    "Therefore the derivative of the cost with respect to $w^{[1]}_{1,2}$ is given by"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\begin{align*}\n",
    "\\frac{\\partial (o-y)^2}{\\partial w^{[1]}_{1,2}} &=\\frac{\\partial (o-y)^2}{\\partial o} \\cdot \\frac{\\partial o}{\\partial z^{[2]}}\\cdot \\frac {\\partial z^{[2]}} {\\partial a^{[1]}_2} \\cdot\\frac {\\partial a^{[1]}_2}{\\partial z^{[1]}_2}\\cdot\\frac {\\partial z^{[1]}_2}{\\partial w^{[1]}_{1,2}}\\\\\n",
    "&=2(o-y)\\cdot g'(z^{[2]})\\cdot w^{[2]}_2 \\cdot g'(z^{[1]}_2) \\cdot x_1\n",
    "\\end{align*}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With\n",
    "$$ l=\\frac 1m \\sum_{i=1}^m(o^{(i)}-y^{(i)})^2$$\n",
    "we then get the derivative of cost function as "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\frac{\\partial l}{\\partial w^{[1]}_{1,2}} = \\frac 2m\\sum_{i=1}^m (o^{(i)}-y^{(i)})\\cdot g'(z^{[2](i)})\\cdot w^{[2]}_2 \\cdot g'(z^{[1](i)}_2) \\cdot x^{(i)}_1,$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "with which we get the gradient descent update rule for $w^{[1]}_{1,2}$ as \n",
    "$$ w^{[1]}_{1,2} \\leftarrow w^{[1]}_{1,2} - \\alpha \\frac{\\partial l}{\\partial w^{[1]}_{1,2}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (b)\n",
    "\n",
    "Eyeballing from the plot, we find that a sample $x=(x_1,x_2)$ will be in class $y=0$ iff $x$ satisfies the following three inequalities:\n",
    "$$\\begin{align*} \n",
    "0.5 &< x_1, \\\\\n",
    "0.5 &< x_2, \\\\\n",
    "3.5 &> x_1+x_2,\n",
    "\\end{align*}$$\n",
    "i.e. iff $x$ lies inside of a certain triangle."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will define the weights between the input layer and the hidden layer in such a way that neuron $h_1$ activates iff $0.5 \\geq x_1$, neuron $h_2$ activates iff $0.5 \\geq x_2$ and neuron $h_3$ activates iff $3.5 \\leq x_1 +x_2$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This implies that if at least one of the three neurons in the hidden layer activated, then $x$ should be put into class $y=1$. \n",
    "\n",
    "Therefore we want the output neuron $o$ to activate iff $a^{[1]}_1 + a^{[1]}_2 + a^{[1]}_3 \\geq 1$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can achieve all of this with the following weights which will give perfect accuracy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "    w = {}\n",
    "\n",
    "    w['hidden_layer_0_1'] = 0.5\n",
    "    w['hidden_layer_1_1'] = -1\n",
    "    w['hidden_layer_2_1'] = 0\n",
    "    w['hidden_layer_0_2'] = 0.5\n",
    "    w['hidden_layer_1_2'] = 0\n",
    "    w['hidden_layer_2_2'] = -1\n",
    "    w['hidden_layer_0_3'] = -3.5\n",
    "    w['hidden_layer_1_3'] = 1\n",
    "    w['hidden_layer_2_3'] = 1\n",
    "\n",
    "    w['output_layer_0'] = -1\n",
    "    w['output_layer_1'] = 1\n",
    "    w['output_layer_2'] = 1\n",
    "    w['output_layer_3'] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (c)\n",
    "\n",
    "Using $f(x)=x$ in the hidden layer implies that the map $x\\mapsto a^{[1]}$ is an affine linear embedding of $x$ into $\\mathbb R^3$.  \n",
    "\n",
    "So to achieve perfect accuracy we would have to be able to linearly separate the embedded images of class 0 and class 1 in $\\mathbb R^3$.\n",
    "\n",
    "To see that this is impossible we just need to note class 0 is contained in the convex hull of class 1 and that this property is preserved by an affine linear embedding, which makes it impossible to separate the two by an affine hyperplane."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
